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SEMI-PARAMETRIC DYNAMIC TIME SERIES MODELLING 
WITH APPLICATIONS TO DETECTING NEURAL DYNAMICS 

By Fabio Rigat and Jim Q. Smith 

University of Warwick 

This paper illustrates novel methods for nonstationary time se- 
ries modeling along with their applications to selected problems in 
neuroscience. These methods are semi-parametric in that inferences 
are derived by combining sequential Bayesian updating with a non- 
parametric change-point test. As a test statistic, we propose a Kullback- 
Leibler (KL) divergence between posterior distributions arising from 
different sets of data. A closed form expression of this statistic is de- 
rived for exponential family models, whereas standard Markov chain 
Monte Carlo output is used to approximate its value and its criti- 
cal region for more general models. The behavior of one-step ahead 
predictive distributions under our semi-parametric framework is de- 
scribed analytically for a dynamic linear time series model. Condi- 
tions under which our approach reduces to fully parametric state- 
space modeling are also illustrated. We apply our methods to estimat- 
ing the functional dynamics of a wide range of neural data, including 
multi-channel electroencephalogram recordings, longitudinal behav- 
ioral experiments and in-vivo multiple spike trains recordings. The 
estimated dynamics are related to the presentation of visual stimuli, 
to the evaluation of a learning performance and to changes in the 
functional connections between neurons over a sequence of experi- 
ments. 

Introduction. Stochastic modeling of dynamic processes is often imple- 
mented via models having time-dependent parameters [Hamilton (1994), 
West and Harrison (1997), Friihwirth-Slmatter (2006)]. For instance, the 
coefficients of state-space (SS) and hidden Markov (HM) time series mod- 
els [Kalman (1960), West, Harrison and Migon (1985), West and Harrison 
(1997), Cappe, Moulines and Ryden (2005)] follow smooth Markovian pro- 
cesses defined either on their own past or on past values of other latent vari- 
ables, whereas those of change-point (CP) models [Muller (1992), Stephens 
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(1994), Loader (1996), Mira and Petrone (1996), Belisle et al. (1998), Fearn- 
head and Liu (2007)] describe pure jump processes. When these dynamics 
are specified appropriately, these time series models can effectively capture 
nonstationarities induced by switches among different dependence regimes 
[Hamilton (1990), Shumway and Stoffer (1991), Robert, Celeux and Diebolt 
(1993), Albert and Chib (1993), McCulloch and Tsay (1994), Kim (1994), 
Ghahramani and Hinton (2000), Fruhwirth-Shnatter (2001)], by smooth 
changes of the model parameters through time [Harrison and Stevens (1976), 
West and Harrison (1986)] or by the occurrence of abrupt changes in the 
data dependence structure [Page (1955), Smith (1975), Carlin, Gelfand and 
Smith (1992), Ferger (1995), Chib (1998)]. 

This paper illustrates theory and applications of a novel sequential method 
for estimating semi-parametrically the coefficients of time series models hav- 
ing time-dependent parameters. Our approach is motivated by applications 
where little is known about the factors driving the data dynamics. Here we 
focus on selected problems in neuroscience where the data exhibit periods of 
smooth change interlaced with occasional large jumps. We model this type 
of data by combining sequential Bayesian updating with a nonparametric 
change-point test. Sequential change-point testing is in fact a well estab- 
lished field which can be traced back at least to the seminal works of Page 
(1954), Kemp (1957), Barnard (1959) and Page (1961) in statistical process 
control. We propose testing for significant changes of a model's parame- 
ters using a novel Kullback-Leibler (KL) divergence [Kullback and Leibler 
(1951), Kullback (1997)] between their one-step ahead predictive distribu- 
tions. The null distribution of this KL statistic reflects the concentration of 
the joint posterior density when new data are generated using the assumed 
model likelihood with parameter values drawn from their current posterior 
distribution. The semi-parametric nature of our method stands in the fact 
that the value of this test statistic does not depend on the model's parame- 
ters, which are integrated out in the calculation of the KL divergence. 

With respect to the SS and HM families, our approach does not describe 
the parameters' dynamics using auxiliary regression equations depending 
on known predictors. With respect to CP models, we do not assume that 
parameter values between successive change points are constant. Instead, we 
induce a time-dependent parameter process by adopting different updating 
strategies depending on whether the KL statistic lies within its critical region 
or not. In the former case, the parameters' joint distribution is updated via 
Bayes' theorem. In the latter case, updating is carried out by matching the 
first two marginal moments of the current joint posterior probability density 
to the prior for the next time point. This second strategy, which does not 
carry the full information content of a posterior distribution to the future, 
substantiates the notion that a change point in the parameter values has 
been detected. 
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With respect to SS, HM and CP models, the advantages of our approach 
are twofold. First, in SS and HM models inferences and predictions are 
sensitive to the form of the state evolution equations [Fruhwirth-Shnatter 
(1995), Bengtsson and Cavanaugh (2006)]. Therefore, an exploratory semi- 
parametric approach is a natural choice for a first analysis of time series data 
when a specific parametrization of the likelihood function is chosen but no 
reliable information about the evolution of its parameters is available [Robin- 
son (1983), Hardle, Liitkepohl and Chen (1997)]. This is typically the case 
for many biological systems, where dynamic responses to novel experimen- 
tal conditions are difficult to anticipate. Second, the joint distribution of the 
model's parameters is updated also between successive change-points, allow- 
ing for a reduction of uncertainty and for smooth changes of the parameter 
estimates over time. 

From a computational perspective, our approach is motivated by observ- 
ing that fully Bayesian sequential inference for a model's time-dependent 
parameters and for a latent multiple change-point process is impractical un- 
less marginal likelihoods can be calculated explicitly. Otherwise, the Bayes 
factors measuring the strength of evidence in the data about the occur- 
rence of change-points can only be approximated numerically [Han and 
Carlin (2001)]. Current methods for calculating these approximations re- 
quire knowledge of normalizing constants which may be hard to obtain and 
they also require estimating the exact value of a posterior probability density 
at one point, which is ideally chosen as one of the posterior modes [New- 
ton and Raftery (1994), Gelfand and Dey (1994), Chib (1995), Chib (1998), 
Fruhwirth-Shnatter (2006)]. Our approach represents a practical alternative 
to these methods in that point estimates of a latent change-point process 
are derived without using marginal likelihoods. 

Section 1 of this paper includes its methodological developments. A gen- 
eral time series framework is introduced and the KL test is illustrated. A 
closed form expression of the KL statistic for exponential family models is 
derived and examples are presented. Markov chain Monte Carlo (MCMC) 
simulation [Gelfand and Smith (1990), Tierney (1994)] is used to approxi- 
mate the exact critical region of the KL statistic under the null hypothesis. 
This approximation is chosen as it only requires the assumed data sam- 
pling distribution and the standard MCMC output. We present a simulation 
study showing that the power of the KL change-point test is unaffected in 
practice by adopting these MCMC approximations when using a conjugate 
Bernoulli model. A sequential algorithm summarizing the computational 
steps involved in the implementation of our method is presented. The be- 
havior of location and spread of the one-step ahead predictive distributions 
arising from our method is described analytically for a conjugate Gaussian 
linear dynamic model. Conditions are given so that our semi-parametric ap- 
proach reduces to fully parametric state-space dynamic time series modeling. 
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In Sections 2, 3 and 4 our method is applied to estimating three different 
types of neural dynamics. First, we analyze a multivariate time series of 
electroencephalogram (EEG) recordings [Delorme et al. (2002), Makeig et 
al. (2002)] to reconstruct the time-varying functional relationships among 
different brain areas. Second, we estimate semi-parametrically a learning 
curve using a univariate binary time series arising from a longitudinal be- 
havioral experiment [Smith et al. (2004)]. Finally, our method is applied 
to estimating the functional dynamics of networks of neurons using in- vivo 
experimental multiple spike trains recordings [Buzsaki (2004)]. 

1. Sequential time series modeling and Kullback Leibler change-point 
testing. Let {Yi}fL 1 represent a sequence of ET-dimensional time series Yi € 
y of random variables j. f measured at the time points t = ti t i < ti t 2 < ■ • ■ < 
U,m with ti tTli < and k = 1, . . . , K. The distinction between the iV time 
series is relevant when we allow for the occurrence of time gaps between 
them. This situation arises, for instance, when ./V consecutive trials are run 
sequentially interposed by resting periods. When ni = 1 for all values of i, 
we effectively have a single i^-dimensional time series of length N measured 
at the time points tn. In this paper the time series data Yi = yi are assumed 
to be generated by a finite-dimensional model P(yi\9i^i, y 0: ^ -1 ^), such as a 
vector auto-regressive (VAR) model with shared coefficients within each 
of the ./V periods. The probability density f(9i-i\y^^ l ~ 1 ^) here represents a 
distribution of the model coefficients given the initial conditions y° and all 
past observations up to and including period i — 1. Note that, although we 
allow the parameter values to vary in time, neither the functional form of 
the likelihood nor the interpretation of its coefficients change over time. 

Within this framework, dynamic modeling consists of specifying a trans- 
fer map, taking as arguments the posterior density /(0j_i|y O: ^ -1 )), the time 
series data yi and possibly other fixed hyper-parameters a and returning 
the density f(0i\y O:l ) for i = 1, . . . , N . Various characterizations of analo- 
gous maps are given in Smith (1990, 1992). For instance, in standard state- 
space models, this transfer map is defined by indexing the prior distribution 
for 8i using the coefficients 9i-\ and a set of hyper-parameters. In Markov 
switching and finite mixture time series models, this transfer map is again 
derived by parametric modeling of the joint density of the coefficients 9i and 
conditional on the location of a sequence of change-points [Fruhwirth- 
Shnatter (2006)]. Here we provide an overview of a transfer map which 
integrates sequential Bayesian inference and change-point testing, leaving 
to Section 1.1 the detailed description of an appropriate test statistic. 

Let 0i-i be a current point estimate of the model's parameters at time 
tii. When i = l these are prior summaries, whereas for i > 1 these esti- 
mates incorporate evidence from past data as described below. If the data 
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yi are generated under significantly different parameter values with respect 
to period i — 1, we say a change-point has occurred. In this case we define a 
transfer prior 



taking as arguments the current parameter estimates and returning a prior 
density h(-) for the model's coefficients 0j. Among the many possible for- 
mulations of this prior, we let its hyperparameters be functions of the first 
two marginal moments of the current posterior density. Similar forms of 
prior moment matching have been used for dynamic point process modeling 
by Gamerman (1992) and for multi-process dynamic linear models by West 
and Harrison (1997). This partial information transfer from the posterior 
distribution ensures that the moment-matched priors allocate most of their 
mass around the current marginal posterior means, but upon detecting a 
change-point, the dependencies among different models' parameters, skew- 
ness, curtosis and the other higher-order moments are all reset to their values 
prior to observing any data. Equation (1.1) represents a partially specified 
state evolution density where neither the exact form of the prior nor the 
time of occurrence of the change-points are given a priori. Specific choices 
for the prior density depend on the structure of the time series model being 
entertained and on the interpretation of its parameters. 

When no change-points are detected prior to observing the data Yi = m, 
under (1.1) the joint posterior density of the model's parameters is 



Here ^i(a) = y and for i = 2, . . . , N the sets ^(a) C y include the time 



current estimates of the model parameters and the hyper-parameters a. 

Implementation of (1.2) presents two related challenges. First, it is essen- 
tial to formulate the rejection sets (^(a), . . . , ^^(a)) in terms of a low- 
dimensional statistic of the data and of the hyper-parameters a. Second, 
it must be possible to derive the distribution of such a statistic over the 
sample space so as to provide at least a sequential approximation of the 
rejection sets for any value of a. A natural way to overcome these challenges 
is to view (^(aO, . . . ,^N(a)) as the a-level critical regions of a sequential 
change-point test based on an appropriate statistic. The transfer map is 
thus completely specified by the prior (1.1) together with a choice of this 
test statistic. 



(1.1) 




(1.2) 




series Yi which are inconsistent with their observed past y 1 



^ under the 
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1.1. A Kullback-Leibler change-point statistic. The Kullback-Leibler di- 
vergence [Kullback and Leibler (1951)] is a well-known information-theoretic 
criterion with many applications in statistics, such as density estimation 
[Hall (1987), Hastie (1987)], model selection [Akaike (1978), Akaike (1981), 
Carota, Parmigiani and Poison (1996), Goutis and Robert (1998)], exper- 
imental design [Lindley (1956), Stone (1959)] and the construction of un- 
informative priors [Bernardo (1979)]. Its geometric properties have been 
thoroughly explored by Critchley, Marriott and Salmon (1994). The change- 
point statistic proposed in this work has a complementary function to the KL 
divergence when used to support model selection. Instead of testing which 
of two competing model structures best predicts one given set of data, here 
we construct a statistic detecting whether the same parameter values could 
have likely generated two sets of data given a common model structure. 

As change-point test statistic we adopt a Kullback-Leibler divergence 



where the expectations in (1.3) are taken with respect to the posterior den- 
sity f (9i\y 0:l ) . The right-hand side of (1.3) is finite when the likelihood func- 
tion is bounded away from zero and infinity for all values of the model's 
parameters and when their posterior density is proper. In this case (1.3) is 
a nonnegative convex function measuring the discrepancy between the pos- 
terior densities f(0i\y 0:l ) and f(0i\y 0: ^ +1 ^) over their common support 0. 
Prior to observing the data i^+i = Vi+i, (1-3) is a random variable in which 
distribution under the null hypothesis depends on that of the future data 
Yi + \ via the likelihood P(Yi + i\9i,y 0:t ). The following sections focus on the 
interpretation and on the computation of (1.3). 

1.1.1. Interpretation of the KL statistic and of the change-points. The 
scalar hyper-parameter a of the joint posterior (1.2) has the interpretation of 
the type-1 error probability for the change-point test using the statistic (1.3). 
The rejection sets can be written explicitly as intervals ^fi(a) = (h,a, u i.a) 
representing the a-level highest probability interval for the random variable 
(1.3) under the hypothesis of no change over period i. 

When a is low and (1.3) lies below the value Zj jQ ,, the likelihood of the 
observed data is almost a constant in the parameters 9i over the range of 
their current posterior density. In other terms, the parameter values max- 
imizing the likelihood of the observed yi+i conditionally on the past data 
yO-.i are gi ven a l m ost zero probability by the posterior distribution under 




(1.3) 



\og{E eAy0 .,{P{y l+1 \6 u y^))) 
-E eily0:i (log(P(y i+l \e h y°- A ))), 
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the hypothesis of no change. If the change-point statistic lies above the 
parameter values maximizing the likelihood of the data Ui+i are associated 
to substantial values of the current joint posterior density, but they are far 
from its global maximum. In this case the joint posterior density of all data 
y 0: ( i + 1 ) under the hypothesis of no change is bimodal, indicating that the 
latest batch of data yt+i are not adequately explained by the current pa- 
rameter values. In both cases the value of the statistic (1.3) indicates that, 
in light of the data y 0: (* +1 ), the posterior density of the model's parameters 
f(6i\y°'^ +1 ^) significantly departs from its assumed form under the hypoth- 
esis that sequential Bayesian updating is adequate. 

When a = 0, no change-point is ever detected, so that the model's pa- 
rameters are updated sequentially only via Bayes' rule. On the other end, if 
a = 1 , a change in the parameter values is systematically detected at every 
time point. In this second limiting case the method proposed in this work 
is equivalent to a fully parametric first order Markov state-space model in 
which state evolution equations have the form (1.1). 

1.1.2. Computation of the change-point statistic. The test statistic (1.3) 
is similar in spirit to the cumulative Bayes factors proposed in West (1986) 
and West and Harrison (1986), with the practical advantage that the com- 
putation of marginal likelihoods is not required. However, in general, neither 
the value of (1.3) nor the rejection sets (^(a), • . • , \J/jv(oO) may be available 
in closed form, so that numerical approximations may be required. In these 
cases, at each time period these approximations can be calculated without 
incurring in additional computational cost using a sequence of parameter 
values {^|™}^f = i generated using a Markov chain Monte Carlo algorithm 
[Gelfand and Smith (1990), Smith and Roberts (1993), Tierney (1994)] hav- 
ing as its target the current posterior probability density. Using this tech- 
nique, the value of (1.3) is approximated by the average 

(1.4) KL(y°^)^log( ^= lPT+1 ) - £m=il°g(Pi+i) 

K ' \ M J M 

where p^\ \ = -P(?/j+i|#™, y 0:l ) is the likelihood of the data yi+i given the 
parameter values 6™ and the past data y 0: \ Using (1.4), the null distribution 
of (1.3) can be approximated as follows: 

(i) for each draw #™ generate a pseudo-realization y^+i using the joint 
sampling distribution P(Yi + i\9™,y° A ); 

(ii) compute the statistic KL(j/m' t+1 '), where = (y , . . • , 2/i, 2/|+i), us- 
ing its Monte Carlo approximation (1.4). 

The empirical distribution of the sequence {KL(ym^ +1 '' )}m=i approximates 
that of the KL statistic (1.3) under the hypothesis of no change. Therefore, 
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the empirical (§,1 — §)th percentiles of the sequence {KL(y m )}m=i 
approximate the rejection sets vE^a) = (li, a ,Ui t0l ) for any given value of a. 

1.2. Change-point test power and sample size. When the time series 
{Yi}f =l have substantially different lengths, the power of the change-point 
test based on the KL statistic is theoretically unchanged. For any value 
of a, this invariance is ensured by the behavior of the posterior distribu- 
tion at the denominator of (1.3). When the data Yi + \ = yj + i carries a large 
amount of information about the coefficients of model P(Yi + \ y 0:t ), their 
joint posterior distribution under the hypothesis of no change concentrates 
by a corresponding large amount, so that the distribution of the KL diver- 
gence concentrates over large values. If li+i is not expected to carry much 
additional information about the model parameters, for instance, due to 
its small sample size nj+i, the null distribution of the KL discrepancy is 
concentrated over small nonnegative values. This mechanism represents an 
automatic adaptation of the critical region \I/j + i(a) of the KL test, ensuring 
that its power does not vary with the sample size of the data sequentially 
accrued over time. 

Although this property is sufficiently clear in theory, it is an open question 
whether the power of the test is significantly affected when our method is im- 
plemented using the MCMC approximations outlined above. Here we briefly 
investigate this issue by simulation using a conjugate Bernoulli model. One 
hundred thousand simulations were run. For each simulation, two sample 
sizes ni and ri2 were independently generated as independent draws from 
a discrete uniform distribution on the integers (1,...,M) with M = 100. 
A success probability ir was also independently generated for each simu- 
lation using a uniform distribution on the interval (0,1). Conditionally on 
(nx, n,2, it), two independent samples of Bernoulli random variables were gen- 
erated, Y\ ~ Ber(7r, n\) and Y% ~ Ber(-7r, n<i). For each simulation, a sample of 
size 5000 was generated from the conjugate posterior Beta(l + YTj=\ > 1 + 
n\ — X^jii^ij) t° compute the Monte Carlo approximation of the KL 
statistic and of the end-points of its 95% probability interval under the 
hypothesis of no change. For this simulation study the type-1 error prob- 
ability of the test was fixed to a = 0.2. Under this sampling scheme, with 
n\ = Hi — 1 for i = 1,2, the random variables jfj^l an< ^ M-i are indepen- 
dent and approximately uniform on (0,1), so that the distribution of the 
statistic Z = log(^t) is approximately standard double-exponential. If the 
power of the KL change-point test is in practice not affected by the values 
of (ni, 712,71"), the distribution of Z for the group of simulations where the 
hypothesis of no change is accepted should be standard double exponential. 
Figure 1 represents with a solid line the empirical cumulative distribution 
function (CDF) of Z for the 79,743 simulations where a significant change 
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was not detected. The two dashed lines in the same figure represent the 
point-wise 99% probability intervals for the CDF of a standard double ex- 
ponential random variable. Since at each point the former CDF always lies 
within its 99% interval, this simulation study suggests that for the Bernoulli 
model the power of the change-point test is not significantly affected by 
different sample sizes (71-1,77.2)- 

1.3. Sequential fitting and change-point testing algorithm. This section 
provides a summary of the computational steps involved by the dynamic 
modeling method illustrated so far. Despite not addressing any model-specific 
issues such as the explicit form of posterior distributions, we aim at provid- 
ing here a general blueprint for implementing our method starting from the 
first sample y\\ 

(i) Upon observing the data y±, derive the posterior density 

/(^VM^ilWyil^yo), 



1 
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Fig. 1. The solid line represents the empirical cumulative distribution function (CDF) 
of the random variable Z for the 79,743 simulations where the hypothesis of no change 
was accepted. The dashed lines represent the approximate end-points of the point-wise 
99% probability intervals for the CDF of a standard double exponential random variable. 
Acceptance of the hypothesis of no change did not cause a significant departure of the 
distribution of Z from that of a standard double exponential distribution, suggesting that 
the power of the change-point test is not significantly affected by different sample sizes 
(ni,n 2 ). 
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where 8q represents an estimate of the parameter values as reflected by 
the initial conditions yo- 

(ii) Having observed data y^, compute the statistic KL(y 0;2 ) and its rejec- 
tion interval ^i(a) = (h t a,ui,a) as described in Section 1.1. 

(hi) If l 1>a < KL(y 0:2 ) < m ; a , no change-point is detected. In this case the 
prior density for 62 is the posterior at point (i) and the posterior density 
for 62 derived using B ayes' rule is 

f(62\y°- 2 ,a) oc f(e2\y OA )P(y2\02,y O:1 ). 

(iv) Otherwise, match the first two posterior moments Q\ to those of the 
prior for 62 and again apply Bayes' rule, deriving the conditional pos- 
terior density 

/(# 2 |#i,y 0;2 ,«) cx fc(02|0i)P(lfe|02, y° :1 )- 

In case (iv) above, the sequentially estimated change-point process up to 
and including times (1,2) reports one change at time 2. Consistently with 
the interpretation of the KL statistic, the model parameters are updated 
using all data starting from the last detected change-point, if any. When 
a change is detected at level 1 — a, the new parameter values are updated 
using their conditional posterior distribution under the transfer prior (1.1) 
and the likelihood of the latest batch of data. 



1.4. Change-point KL statistic for exponential family models. Several 
properties of the KL divergence for exponential family models have been 
explored by McCulloch (1988). Here we show that in this circumstance also 
the divergence (1.3) has a closed form. In this case the algorithm illustrated 
in Section 1.2 is simplified, as only the critical intervals ^fi(a) = (^ jQ ,Uj jQ ) 
need being approximated. Without loss of generality, in what follows we as- 
sume that no change-point is detected prior to period i. Also, we let Yi be a 
1 dimensional sample of conditionally independent observations with length 
rii and joint density [Diaconis and Ylvisaker (1979)] 

(1.5) PiWi) = f[a{Y hJ )e Y ^- b ^\ 

where 9{ is a scalar canonical parameter. Diaconis and Ylvisaker (1979) show 
that each element of Yi has mean and variance 

Using the prior 

f(9 l \n ,y ) = c(n ,S )e s ° e >- n ^\ 
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where 5*0 = n^yo for scalar no and yo> the posterior for 6i given the past data 
y°' A has conjugate density 

(1.6) /(^|n(i),y O:i ) = c(?i(i),S(i))e n(i)((SW/nW)ei - fe(0l)) ' 

where n{i) = ^}=o n i' ^(*) = S}=o n i%' anc ^ % represents the arithmetic 
mean of sample yj. Using the results of Gutierrez- Peha (1997), the posterior 
mean and variance of 9{ are 

W(0\ n qf\\ 9H(n(i),S(i)) dH(n(i),S(i)) 
E{e i \n{t),S{i)) = , V{9i\n(i),S{i)) = -— > 

and the posterior mean and variance of the function b(Qj) are 

dH{n{i),S{i)) 



E(b(0i)\n(i),S(i)) 



V(b(ei)\n(i),S(i)) 



dn{i) 
dH(n(i),S(i)) 



dn(i) 2 

where H(n(i),S(i)) = — log(c(n(i) , S (i))) . Using these results, we derive the 
following explicit form for the KL divergence (1.3): 

Theorem. When the posterior density for the coefficients B{ has form 
(1-6), given the data up to and including yi+i, the Kullback-Leibler statistic 
(1.3) is 

KL(v ^) loJ 3(0) A 3 dH(n(i),S(i)) 

( ) " l0g Un« + l),^ + l))J-^ 1 dS(i) 

1.7 

^ dH(n(i),S(i)) 
+ Hl+1 dnj) ' 

where the terms on the right-hand side of (1.7) are defined above. 

Proof. By letting the posterior densities f(0i\n(i),S(i)) and f(9i\n(i + 
1), S(i + 1)) have form (1.6), the KL (1.3) becomes 

KLfo0:,,+1,) = ^( ^g ) - SMm) + - +lEibim 

For exponential family models, the expectations E{6j) and E(b(8i)) with 
respect to f(0i\y ' 1 ) are reported above. By substituting these expressions, 
equation (1.7) obtains the following. □ 

Example 1.1. When Y{ is a Gaussian random variable with mean \i{ 
and precision Aj, its distribution can be written in the form (1.5) using the 
two-dimensional statistic 

Y* = [Y U Y?] 
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and the canonical parameter 

Q% = 02,i] 
with 

a{Y*) = {2*y l l\ 



W = -Iio g (0 2)i )-Si. 

& t>2,i 



The conjugate prior for (/Xj, A,) is Normal-Gamma ^(^7, A,(2a — 1)) Ga(Aj|a, 
/3) with coefficients a > 0.5, f3 > 0, 7 € 1Z and normalizing constant [Bernardo 
and Smith (2007)] 



c(n ,5 ) 



2vr\ 1 / 2 S^ o/2 /2 



n J r((no + l)/2)' 



where n = 2a- 1, = [2/^0,2/2,0] = b, 2^T + 7 1' S h0 = n o2/i, and 5 2,o = 
n oV2 0- Upon observing the realization (y\, . . . ,yi), the normalizing constant 
of the corresponding conjugate posterior is 



c(n(i),5(i)) 



2vr \ 1/2 5(2, i)*(i.0/2/ 2 



n(i); r((n(i) + l)/2)' 



where n(z)=n + i, 5(1, i) = 5 1)0 + J2)=i Uj and 5(2,i) = 5 2 ,o + Z)}=i?/|- 
When also yi + \ is observed, using (1.7), the KL statistic can be written as 



KL( ?/ 0: ( i+1 ))=log^r 
+ log 



n(t + l) + 1 



5(2, t)g(i.0/2 
5(2, i + \)S{W)/2 



ym 5(2,i) + 2n(i) +i 



n(i) + 1 



y*+i 



+ -loe 

2 e 



n(i + 1) 
n(i) 



log 



5(2, i; 



n(z) + l \ ar((w(0 + l)/2) 
2 y are(i) 



Example 1.2. Let be a sample of size of conditionally independent 
Bernoulli random variables with success probabilities {iti}f =l . The canonical 
representation of the Bernoulli probability mass function obtains, by letting 
Qi = log(yrT-), b(9{) = log(l + e 9i ) and a{Yi) = 1- The conjugate prior for 7Tj 
is Beta(5o,mo), where mo = no — So- Upon observing (yi, . . . , yj), the con- 
jugate posterior is Beta(5(i), m(i)), where S(i) = X^=o^i> n W = S}=o n i 
and m(i) = n(i) — S(i). When also yi+i is observed, the KL statistic (1.7) 
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has form 

\ni-Si 



KL(/<«>) = lo g ( n;L - ( " ( lt fc)rl -- (m(i) + 
— Si + i 



It' , 



T(S(i)) d(T(m(i))/T(S(i))) 



T(m(i)) dS(i) 
( d(T(n(i))T(m(i)))\ /,„, , , 

+ ni+1 \d^)^J /( r ( n W) r ( m W))' 

where m(i) = n(z) — 

Example 1.3. Let 3^ represent the random number of events of a given 
kind observed within a time interval (ti i,ti tTH ] of fixed length. For this ex- 
ample we assume that the latter is identical for all samples i = 1,...,N. 
Let the random times at which the events take place be distributed accord- 
ing to a homogeneous Poisson process with intensity Aj, so that the distri- 
bution of Yi is Poisson with parameter A* = Aj(tj n< — tj i). The canonical 
form of the Poisson distribution has parameter 6i =log(A*) and functions 
a(Yi) = y^,b(6i) = e . The conjugate prior for A* is Gamma with parame- 
ters Ga(So, n o) having mean uq and variance Upon observing (j/i, . . . , j/i), 
the conjugate posterior for A* is G&(S (i) , n(i)) with S(i) = Sq + X]}=i2/j> 
n(i) = no + i. When also yj+i is observed, using (1.7), the KL statistic has 
form 

S(i)n(i) s{ ^ 



KL(y 0: ( i+1 ))=log 



+ «|.» s w.))-S/W S(i) 



V as® /' v w v n(«) 

1.5. Effect of change-points on predictive densities. In this section we 
illustrate analytically the effect of detecting a change-point on the one-step 
ahead predictive density using the transfer prior (1.1) and a conjugate Gaus- 
sian dynamic linear model. For each value of i, in what follows we let the 
scalar random variable Yi be distributed as N(fj,i,af). Analogously to Ex- 
ample 1.1, the prior distribution for 9i = (ni,af) is taken as the conjugate 
Normal-inverse Gamma 

I u u ^ 



where 1 < i* < i is the time of the last detected change-point and (//**_ i , of*_i) 
represent the estimated mean and variance of the joint posterior density at 
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time i*. If i* = 1, (//o,o"o) represents a fixed initial condition. Here the prior 
density of the variance is 

f(a 2 \u al ,)= (^-i/^ 2 -2W2+1) -(^y^f) 

Under this formulation, the prior expectation of the mean is and that 

of the variance is JJ^—i ^ follows that the one-step ahead marginal 

predictive distribution is a noncentral Student-t. In absence of change-points 
prior to time i, the predictive density is 

, \ / i+lv+i 1 

(1-8) Y i+1 ~t v+i [Hi, 



i + 2 2 a 



S-2 / ' 



where 



^ = TT7^ + TT7 y( ' ) ' 

and (y^ 1:i ; Sn.jp represent respectively the sample mean and variance of the 
data y . If a change-point is detected by the KL statistic (1.3) at time 
1 < i* < i, under the transfer prior (1.1), the conditional predictive density 
is 

i-i* + 2v + i-i* + 1 1 



(1.9) Y i+1 



i-i* + 3 2 (or 2 )*/' 



where 



ft 2 )' = >-.)" + + T^^-i - 

Since the mean and variance of the noncentral Student-i random variable 
with density i„(/i,cr 2 ) are respectively equal to fi and to ^^cr 2 , equations 
(1.8) and (1.9) provide a characterization of the one-step ahead posterior 
predictive moments as a function of the time of the last detected change- 
point and of the inverse-Gamma prior coefficient v. For i* > 1 the predictive 
mean is less influenced by the sample mean of the data preceding the change- 
point, y l:ty% *~ l \ and it is more heavily influenced by y 1 *' 1 , that is, the sample 
mean of the data from the change-point on. When a change-point is detected, 
the predictive variance is larger with respect to the case of no change. Its 
relative increase is a decreasing function of the difference (i — i*), which 
measures how far in time the change-point occurred, and it is an increasing 
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function of the coefficient u, which measures the strength of the prior at the 
initial time. 

This behavior is consistent with the intuition that predictions ensuing 
from a dynamic time series model should discount the information content 
of remote data and focus on more recent data when significant dynamics 
occur. In absence of an autoregressive model structure, as in the present 
section, the distinction between remote and recent data is entirely left to 
the timing of the detected change-points. 

2. Analysis of multivariate EEG recordings. This section presents an 
application of the methods discussed above to estimating neural functional 
dynamics using multivariate electroencephalogram (EEG) recordings. The 
data analyzed here arise from a sequence of 80 identical tests each having 
length of approximately four seconds with sampling rate of 128 points per 
second. During each test, the same subject was to press a button when a 
green square appeared in a specific screen location [Makeig et al. (2002)]. 
Previous analyses of these data have emphasized aspects of time-dependent 
interactions among different EEG channels, such as an increased overall syn- 
chronization of different brain areas after presentation of the visual stimulus 
[Delorme et al. (2002)]. The multidimensional EEG time series are modeled 
here as a discrete time Gaussian stochastic process, in which randomness is 
thought of as arising from the intrinsic variability of the brain activity and 
from the presence of experimental artifacts. We describe the dynamic func- 
tional relationships among different brain areas using the time-dependent 
means and covariance matrices indexing the data likelihood. 

The 32 EEG channels record neural activity arising from seven function- 
ally distinct brain areas, that are the frontal (F), central (C), central-parietal 
(CP), parietal (P), temporal (T), parietal-occipital (PO) and occipital (O) 
lobes. At each time point the recording channels targeting each of the seven 
brain areas were averaged within each trial and then across trials so as to 
obtain a seven-dimensional time series. The rationale for this preprocessing 
is that recordings within each brain area exhibit similar patterns within and 
across trials so that, for the purpose of our analysis, averaging yields a lower 
dimensional signal less affected by channel-specific recording noise. These 
trial- averaged EEG recordings are represented in Figure 2. The activity of 
the different areas prior to the presentation of the visual cue are tightly syn- 
chronized, exhibiting oscillations of high amplitude around frequency 10 Hz 
and fast low-amplitude oscillations at 60 Hz. Due to their low amplitude, 
the latter are hard to see in Figure 2. The lower frequency oscillations are 
consistent with the so-called a band reflecting eye movements. The higher 
frequency and lower amplitude oscillations are due to the alternating current 
being used in this experiment, suggesting an imperfect electrode grounding. 
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Fig. 2. EEG recordings (blue), one-step ahead marginal posterior point predictions and 
95% posterior predictive intervals for each brain area (red). The estimated change-point 
times are marked on the horizontal axis of each plot. The two vertical lines represent 
respectively the average stimulus and response times. The brain activity is reduced roughly 
at half of the initial phase of the experiment and it increases when the cue is presented. 
The sharpest increases are detected in the frontal (F) and central (C) lobes, followed by 
the central-parietal (CP), parietal (P) and temporal (T) lobes. The estimated change in 
activity in the parietal- occipital (PO) and occipital (O) areas is far less pronounced. 
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The trial-averaged signal at time i, Yi, is modeled as Nj(fj,i, Sj). To derive 
Bayesian inferences for the mean vector and for the covariance matrix, we 
use the conjugate Normal- inverse Wishart prior: 



where 1 < i* < i is the time of the last detected change-point prior to time i. 
The marginal prior expectations are matched to the corresponding estimated 
marginal posterior moments at time i* — 1 consistently with (1.1). At time 
i these prior distributions are updated using Bayes' theorem, taking into 
account all data points within the interval [i* + Therefore, by combin- 
ing the KL test with a static Bayesian update, this dynamic model retains 
a memory of past mean and covariance estimates from the last detected 
change-point onward. From this perspective, this model can be thought of 
as a form of time-varing vector autoregression which at time i has order 



For this analysis, the initial conditions fio and £o were set respectively 
equal to the null vector and to the identity matrix. The hyper-parameter 
of the posterior density was set at a = 0.01, so as to detect only the most 
prominent changes. The number of degrees of freedom of the Inverse Wishart 
density is set so that predictive intervals of length consistent with the set 
value of a are not excessively inflated when a change is detected. The dis- 
tribution of the KL statistic and its value were approximated at each time 
i using the last 500 Gibbs sampler draws of the mean and of the covariance 
matrix. 

Along with the data, Figure 2 shows the one-step ahead marginal pos- 
terior point predictions and their 95% highest posterior predictive intervals 
for each of the seven brain areas. The estimated change-point times are 
marked along the horizontal axes. The predictions emphasize a downward 
shift in brain activity taking place roughly at half of the initial phase of the 
experiments, followed by a sharp increase corresponding to the cue presen- 
tation, a downward trend following the motor response and a stabilization 
of the EEG signals toward the end of the experiments. The first two change- 
points identify a transition during the first part of the experiment toward a 
state of more intense attention. The third to sixth change-points capture an 
abrupt increase in neural activity related to the presentation of the visual 
cue, whereas the last change-point indicates a return to a baseline activity. 
The sharp increases in the activity of the frontal and central areas during the 
generation of the response are consistent with their characterization as ex- 
ecutive and motor centers of the brain. The intermediate increase in activity 
of the temporal and parietal lobes, mainly involved in speech, hearing, mem- 
ory and in the integration of sensory inputs, reflects the mild involvement 



(2.1) 
(2.2) 
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of their functions in the execution of the task entailed by this experimental 
protocol. The mild response to the visual stimulus of the parietal-occipital 
and occipital areas, including the visual cortex, is somewhat surprising. An 
analogous analysis of the trial-averaged EEG data from the eight distinct 
channels recording from these two areas reveals a consistently higher activ- 
ity of the occipital channels with respect to the parietal-occipital ones but 
no significant change in response to the visual stimulus. 

Figure 3 depicts the estimates of the time-dependent variance and covari- 
ance functions for each brain area. Whole segments represent periods during 
which their respective 95% highest posterior intervals do not intersect zero. 
The estimated change-point times are marked on the horizontal axis of each 
plot, as in Figure 2. All estimated variances and covariances vary over time, 
indicating that a time-dependent covariance matrix is an appropriate mod- 
eling assumption for this data. The estimated covariances are almost always 
positive, suggesting that the activity of the seven brain areas is dynamically 
cooperative as found by Delorme et al. (2002). An unexpected feature of 
the estimated covariance functions is their spatial ordering over time, the 
strongest relationships being estimated between adjacent brain areas. Since 
neither in the Gaussian likelihood nor the priors (2.1)-(2.2) include a spa- 
tial component, these estimates suggest a close correspondence between the 
detected functional relationships and the anatomical structure of the brain. 

3. Estimation of a learning curve. The data analyzed in this section 
arises from a sequence of 55 trials during which a macaque monkey per- 
formed a location-scene association task [Wirth et al. (2003)]. The learning 
curve is represented by the time-dependent estimates of the trials' success 
probabilities. Smith et al. (2004) introduced a parametric state-space model 
for inferring the learning performance using longitudinal behavioral experi- 
ments. The learning curve is thereby modeled using univariate binary time 
series data along with a logit link for each trial's success probability and 
a Gaussian state evolution equation for the parameters' dynamics. In this 
section we use the same Bernoulli sampling distribution for the binary trial 
outcomes as in Smith et al. (2004) and we estimate the dynamics of its 
success probability over time using the semi-parametric method illustrated 
in Section 1. A first difference between our model and that of Smith et al. 
(2004) is that we do not use a nonlinear link function, thus imposing fewer 
constraints on the shape of the learning curve. A second difference is that 
the results of Smith et al. (2004) are based on a smoothing algorithm using 
both past and future data to obtain estimates at present times, whereas our 
method uses past observed values and simulated current data to update the 
distribution of the success probability. In the following analyses the success 
probability of the first trial was given a uniform prior, whereas the transfer 
prior (1.1) was implemented using a conjugate Beta prior. The data were 
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Fig. 3. Estimates of the time- dependent variance and covariance functions for the frontal 
(green), temporal (yellow), central (magenta), central-parietal (cyan), parietal (black), 
parietal- occipital (red) and occipital (blue) lobes. Whole segments represent periods dur- 
ing which their 95% posterior intervals do not intersect zero. The estimated change-point 
times are marked on the horizontal axis of each plot. The estimated covariances are al- 
most always positive and time-varying, representing different levels of cooperative activity 
of the seven brain areas over time. The covariance functions are also spatially ordered, the 
strongest relationships being estimated between physically adjacent brain areas. 
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analyzed under different values for the hyper-parameter a within the range 
(0.01,0.9), respectively requiring from strong to weak evidence for detecting 
a change-point. The distribution of the KL statistic under the null hypothe- 
sis of no change was approximated using ten thousand Monte Carlo samples 
from the Beta posterior distribution of each trial's success probability. For 
this data, the smoothed state-space estimates of Smith et al. (2004) indi- 
cate that with 90% confidence the success probability significantly exceeds 
its chance value 0.25 from trial 23 onward, whereas their unsmoothed es- 
timates indicate that the chance value is significantly exceeded from trial 
27 onward. Figure 4 shows our estimates of the success probabilities under 
the four selected values of a = 0.9, 0.5, 0.1, 0.01. From these estimates we 
conclude that learning has effectively taken place from trial 29 onward, that 
is, after observing a total of 7 successes yielding an empirical cumulative 
success rate of 0.24. Figure 4 also compares our dynamic estimates with the 
empirical cumulative proportion of successful trials, which is represented 
by asterisks. As the value of a decreases, so does the number of detected 
change-points. In particular, under the uniform prior for the initial success 
probability when a < 0.1, our estimates of the learning curve are roughly 
equivalent to the empirical proportion of cumulative successes. 

4. Dynamic modeling of functional neuronal networks. This example il- 
lustrates the application of the method presented in Section 1 in the context 
of a model for networks of spiking neurons. During the experiments analyzed 
here, the neural activity of a small section of a sheep's temporal cortex is 
recorded in vivo on a millisecond time frame using a multi-electrode array 
[Kendrick et al. (2001)]. The goal of these experiments was to investigate in 
detail the activity of brain areas associated with memory. Along each of 77 
disconnected experiments, a sheep is shown either a blank screen or two im- 
ages. In the latter reward is given when one of a set of "familiar faces" 
is correctly identified. It is important to note that, even within small brain 
areas, these experimental techniques only record the activity of a relatively 
small fraction of neurons. Therefore, these data do not allow reconstructing 
direct physical interactions among neurons but only functional relationships 
among relatively distant recording electrodes. 

Introductions to the neuronal physiology and to neuronal modeling are 
presented in Fienberg (1974) and Brillinger (1988). Recent surveys of the 
state-of-the-art in multiple spike trains modeling can be found in Iyengar 
(2001), Brown, Kass and Mitra (2004), Kass, Ventura and Brown (2005), 
Okatan, Wilson and Brown (2005), Rao (2005) and Rigat, de Gunst and 
ven Pelt (2006). Dynamic point process neuronal models based on fully 
parametric state-space representations have been proposed by Eden et al. 
(2004), Truccolo et al. (2005), Brown and Barbieri (2006), Srinivansan et al. 
(2006) and Eden and Brown (2008). 
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Fig. 4. Macaque monkey binary data and semi-parametric estimates of their time-de- 
pendent success probabilities using a — 0.9, 0.5, 0.1, 0.01. The binary data are represented 
as vertical ticks along the lower and upper horizontal axes. Asterisks represent the cumu- 
lative proportion of successful trials. The sequence of estimates of the success probabilities 
describe the macaque 's learning curve over time. The first trial at which the learning curve 
lies above its chance level 0.25, indicating that learning has effectively taken place, is num- 
ber 29. Lower values of a require more extreme values of the KL statistic for detecting 
change-points, making our estimates of the learning curve progressively closer to the em- 
pirical cumulative success rates. 



4.1. Binary network model. In what follows each element of the exper- 
imental time series \Yi\JLi ls ^i,fc,tj jU) = 1 if neuron k fires at time ijj(j) 
during trial i and Y^^. . = otherwise with = l,...,m. We model the 
joint sampling distribution of the multiple spike train data for trial i, Yi, as 
a Bernoulli process with renewal [Rigat, de Gunst and ven Pelt (2006)]. The 
joint probability of a given realization yi is 

K 

(4.1) P( Yl = y l \n l )= ]J lK^i' W 
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For model (4.1) to be biologically interpretable, the firing probability of 
neuron k at time during trial i, 71"^^.,...., is denned as a one-to-one 

nondecreasing mapping of a real-valued voltage function t^fc,^ onto the 

interval (0,1). The function u^fc,^ represents the unnormalized difference 
of electrical potential across the membrane of neuron k at time tijU)- Let 
T «,fc,ti be the last spiking time of neuron k prior to time ttj(t) during trial 
i, that is, 



1, if X] Fi > fe > r = or t iji(i) 

r=l 

max{l < r < t itjty{) : Y^ k>T = 1}, 
otherwise 



and the voltage function is modeled as 

(4.2) Vi,k,t i>H{) ='^2^i,k,l ^2 



1=1 w=T i*,t i>m 



The spiking probabilities are linked to (4.2) via the logistic mapping 



The coefficients /3j & / represent the strength of the functional relationship 
from neuron I to neuron k during trial i. When fti^j is positive during trial 
i, the firing activity of neuron I promotes that of neuron k, whereas when it is 
negative, firing of / inhibits that of k. When neurons I and k are physically 
connected to each other, the coefficients f3i t k,i and fy^j represent direct 
functional connections. When the two neurons are not directly connected to 
each other, these network coefficients summarize a functional relationship 
possibly arising from a long chain of neurons in which activity cannot be 
currently recorded by the MEA technique. The coefficients f3i k,k represent 
the spontaneous spiking rate of neuron k during trial i. The last summation 
term in equation (4.2) indicates that the membrane potential of a neuron is 
assumed to be influenced only by the spiking activity of the other neurons 
during its last inter-spike interval. In this simple model we do not take into 
account the occurrence of leakage currents across the neuronal membrane 
[Plesser and Gerstner (2000)], so that the effect of the spikes produced by 
neuron I on the voltage function does not decrease over time. 

For each trial i = 1, . . . , N we use a Metropolis sampler to produce ap- 
proximate posterior inferences for the K 2 model parameters. For each exper- 
iment, we run a neuron- wise random scan update with independent Gaussian 
random walk proposals for twenty-five thousand iterations. The initial prior 
for the parameters of all experiments is Gaussian with zero mean, standard 
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Fig. 5. Each dot in the left panel marks the number of recorded spikes of the 7 most active 
electrodes for each millisecond of the 77 experiments. Each dot in the right panel marks 
the proportion of milliseconds during which each electrode recorded a spike during each 
experiment. The range of these mean firing rates is 0. 02-0. H, reflecting the low overall 
spiking rates typical for this type of recording. Clusters of points associated to relatively 
high mean spiking rates suggest that the underlying neurons may be mostly connected via 
mutually excitatory functional relationships. 



deviation 1 and zero covariance for all pairs of neurons. Conditionally on 
the data y 0:l and on the current posterior estimates, upon observing the 
outcome of the ith + 1 experiment, y^+i, we use the KL statistic (1.3) to 
test whether a significant change occurred in any of the model's parameters. 
The occurrence of such changes and the corresponding parameter estimates 
indicate statistically significant variations of different aspects of the neural 
activity. 

4.2. Analysis of sheep multiple spike trains. In this section we analyze 
the spiking activity of the 7 most active electrodes among the 64 record- 
ing channels. The plot on the left in Figure 5 shows the number of spikes 
recorded from these 7 electrodes along all 77 experiments. The panel on the 
right shows the mean spiking rates for each electrode and experiment, which 
reflect the overall low spiking rates typical of this type of measurement. The 
co-occurrence of relatively high firing rates for all electrodes suggests that 
the most prominent connections among the underlying neurons may be mu- 
tually excitatory functional relationships. 

Table 1 displays summaries of the point estimates of the network coef- 
ficients across all experiments. Each cell reports the proportions of experi- 
ments during which each of the network coefficients were found either signifi- 
cantly excitatory or inhibitory. Significance here denotes experiments during 
which both 95% end points of the posterior interval of a pair-wise functional 
connection lie respectively above or below zero. The self-dependence coeffi- 
cients on the main diagonal are always found significant and negative, rep- 
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resenting the well-known property of neural refractoriness. The excitatory 
functional connection from electrode 3 toward 6 is most prominent, being 
significant over approximately 63% of the experiments. The most promi- 
nent inhibitory connections are found significant over 62% and 68% of the 
experiments and they relate respectively electrodes 4 to 5 and 6 to 3. Note 
that the time series model (4.2) identifies a directed cyclic graph (DCG) of 
pair-wise functional relationships where the connections i — >■ j and j —> i are 
captured by distinct coefficients, so that the proportion of 3 — > 6 significant 
excitatory connections and that of 6 — > 3 significant inhibitory connections 
are not constrained to add up to one. Figure 6 illustrates in detail the point 
estimates and the 95% highest posterior intervals for the most prominent 
excitatory connection, 3—^6, together with those of both electrodes' self- 
dependence and of the mostly inhibitory connection 6 — s> 3. The estimated 
correlation over experiments between the self-dependence coefficients ^3 
and those of /^g is —0.28 and that between /?6,6 and j3%$ is —0.29, suggest- 
ing that neural self-inhibition may tend to compensate for excitations and 
inhibitions supplied by the other recorded functionally connected cells. 

5. Discussion. This work is motivated by the challenges encountered in 
constructing time series models when the factors driving the dynamics of 
their parameters are not well understood. The semi-parametric method il- 
lustrated here provides flexible time-dependent estimates without relying on 

Table 1 

Relative number of experiments during which both end points of the 95% posterior 
interval for any of the pair-wise functional connection coefficients lie respectively above 
or below zero, identifying significant excitatory (left proportion) or inhibitory (right 

proportion) relations 



i\j 


1 


2 


3 


4 


5 


6 


7 


1 


0.00 1.00 


0.10 0.28 


0.28 0.35 


0.30 0.12 


0.30 0.30 


0.08 0.29 


0.08 0.38 


2 


0.01 0.46 


0.00 1.00 


0.32 0.35 


0.05 0.60 


0.25 0.25 


0.30 0.14 


0.41 0.12 


3 


0.25 0.13 


0.28 0.14 


0.00 1.00 


0.08 0.30 


0.25 0.12 


0.01 0.68 


0.08 0.28 


4 


0.09 0.40 


0.05 0.36 


0.34 0.14 


0.00 1.00 


0.33 0.12 


0.08 0.13 


0.10 0.36 


5 


0.28 0.10 


0.08 0.42 


0.51 0.13 


0.08 0.62 


0.00 1.00 


0.08 0.28 


0.25 0.25 


6 


0.10 0.39 


0.10 0.13 


0.63 0.00 


0.08 0.51 


0.05 0.30 


0.00 1.00 


0.08 0.40 


7 


0.09 0.41 


0.10 0.14 


0.30 0.28 


0.08 0.14 


0.08 0.28 


0.05 0.39 


0.00 1.00 



The self-dependence coefficients on the main diagonal are always found significant and neg- 
ative, representing the well-known property of neural refractoriness. Bold entries represent 
functional connections which are found significant over more than 50% of the experiments. 
The excitatory functional connection from electrode 3 toward 6 is most prominent, being 
significant over approximately 63% of the experiments. The most prominent inhibitory 
connections relate electrode 4 to 5, which is significant over 62% of the experiments, and 
electrode 6 to 3, which is found significant over 68% of the experiments. 
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Fig. 6. Point estimates and 95% highest posterior intervals for the self-dependence pa- 
rameters of electrodes 3 and 6 (main diagonal) and of their pair-wise functional connec- 
tions @3 t e and /3e,3- These two electrodes exhibit a comparable level of refractoriness over 
all experiments. The estimated correlations over experiments between the self-dependence 
coefficients Ps,^ and those of ,03,6 is —0.28 and that between j3e,6 and j3e,3 is —0.29, sug- 
gesting that neural self-inhibition may tend to compensate for excitations and inhibitions 
supplied by functionally connected cells. 



explicit modeling of these dynamics. For exploratory data analyses, such as 
those presented in Sections 2, 3 and 4, these estimates may suffice to address 
specific scientific questions. Otherwise, appropriate measures of dependence 
between these time-dependent estimates and experimental factors of interest 
provide a principled basis for more precise formulations of the parameters' 
dynamics. Describing the exact form of such dependence measures is very 
much context-dependent and it lies outside of the scope of this work. 

A distinctive feature of the modeling approach proposed here is that it 
combines elements of sequential Bayesian learning and conditional frequen- 
tist inference along the lines of Guttman (1967), Box (1980), Berger, Brown 
and Wolpert (1994), Meng (1994), Gelman, Meng and Stern (1996), Berger 
and Bayarri (1997), Spiegelhalter et al. (2002), Bayarri and Morales (2003), 
Kuhnert, Mergesen and Tesar (2003) and Bayarri and Berger (2004), among 
others. A general treatment of such pragmatic combination of frequentist 
and Bayesian ideas for model criticism can be found in Chapter 8 of O'Hagan 
and Forster (1999). From this perspective, our method is a "Bayesianly justi- 
fiable" procedure [Rubin (1984)] because only those future unobserved data 
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that are consistent with the current conditional posterior distribution of the 
model's parameters are relevant for approximating the distribution of the 
KL change-point statistic (1.3). 

The latter reflects a notion of change-point as an observation which, on 
the basis of the chosen model with its prior and the observations accrued so 
far, is "surprising" from a predictive point of view. Note that this character- 
ization does not depend on the parametrization of the state space nor on the 
unobservable sample paths of latent states, but it depends only on the pre- 
dictives on observables. Defining models and their properties via their one 
step ahead predictive statements has been recommended, among others, by 
Geisser and Eddy (1979) and San Martini and Spezzaferri (1984) for predic- 
tive model selection, by Dawid (1984) in his prequential inference, by West 
and Harrison (1986) for monitoring the adequacy of Bayesian forecasting 
models and by Smith (1992) for comparing the characteristics of different 
forecasting models. More recently, optimal predictive model selection criteria 
have been proposed by Barbieri and Berger (2004). 

The results presented in Section 3 revealed a substantial dependence of the 
estimated learning curve with respect to the value of the hyper-parameter a. 
It is important to recall that this hyper-parameter measures how extreme a 
value of the KL statistic is needed for detecting a change-point. Therefore, a 
dependence of its corresponding estimated change-point process on the value 
of a is to be expected, with lower values of this hyper-parameter yielding less 
numerous change-points and vice versa. From this perspective, our method 
is not meant to be fully automatic and parameter estimates derived using 
different values of a should be inspected to gauge their sensitivity in the 
context of the specific time series model being entertained. 

In this work, a single change-point process common to all model's param- 
eters is used to define their conditional posterior distribution. Should the 
data provide evidence of changes of only some parameters, the posterior dis- 
tributions for the unchanging coefficients would not make the most efficient 
use of the data. It is important to note that while in principle any subset 
of model parameters can be associated to a distinct change-point process, 
the limitations for implementing multivariate change-point process inference 
within our framework are eminently practical. This is because marginal like- 
lihoods for each subset of model parameters having a different change-point 
process are required to approximate the distribution of their change-point 
test statistic. For classes of models where marginal likelihoods are available 
in closed form, this work can be extended by introducing a random variable 
identifying groups of coefficients sharing a common change-point process. 

Posterior simulation via Markov chain Monte Carlo algorithms has been 
used in this work to fit multivariate time series models and to approximate 
critical values of the KL statistic. Although the current implementation of 
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our method is operationally realistic, these computationally intensive meth- 
ods are in fact rather impractical for an iterative process of model formula- 
tion and criticism. Currently two directions are being pursued to improve the 
computational efficiency of our method. On the one hand, faster resampling 
methods such as particle filters [Doucet, De Preitas and Gordon (2001)] 
and approximate Bayesian computation [Marjoram et al. (2003)] can be 
adopted. Alternatively, analytical posterior approximations can be adopted 
[Tierney and Kadane (1986)]. For instance, in the context of sequential time 
series modeling, Koyama, Perez-Bolde and Kass (2008) recently proposed 
a Laplace-Gauss posterior approximation that obviates the use of cumber- 
some resampling techniques. 

Acknowledgments. The authors acknowledge the support of the Cen- 
tre for Research in Statistical Methodology (CRiSM) at the University of 
Warwick and of the Warwick Centre for Analytical Science during the devel- 
opment of this work. We wish to thank Arnaud Delorme for sharing the EEG 
recordings analyzed in Section 2. We also wish to thank Professor Jiangfeng 
Feng and collaborators for providing the multiple spike trains analyzed in 
Section 4. 

REFERENCES 

Akaike, H. (1978). On the likelihood of a time series model. The Statistician 27 217-235. 
Akaike, H. (1981). Likelihood of a model and information criteria. J. Econometrics 16 
3-14. 

Albert, J. H. and Chib, S. (1993). Bayes inference via Gibbs sampling for autoregressive 
time series subject to Markov mean and variance shifts. J. Bus. Econom. Statist. 11 
1-15. 

Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. 

Statist. 32 870-897. MR2065192 
Barnard, G. A. (1959). Control charts and stochastic processes. J. Roy. Statist. Soc. Ser. 

B 21 239-271. 

Bayarri, M. J. and Berger, J. O. (2004). The interplay between Bayesian and frequentist 

analysis. Statist. Set. 19 58-80. MR2082147 
Bayarri, M. J. and Morales, J. (2003). Bayesian measures of surprise for outlier detec- 
tion. J. Statist. Plann. Inference 111 3-22. MR1955869 
Belisle, P., Joseph, L., MacGibbon, B., Wolfson, D. B. and du Berger, R. (1998). 

Change-point analysis of neuron spike train data. Biometrics 54 113-123. 
Bengtsson, T. and CAVANAUGH, J. E. (2006). An improved Akaike information criterion 

for state-space model selection. Comput. Statist. Data Anal. 50 2635-2654. MR2227324 
Berger, J. O. and Bayarri, M. L. (1997). Measures of surprise in Bayesian analysis. 

ISDS Discussion Paper 46, Duke Univ. 
Berger, J. O., Brown, L. and Wolpert, R. L. (1994). A unified conditional frequentist 

and Bayesian test for fixed and sequential hypothesis testing. Ann. Statist. 22 1787- 

1807. MR1329168 

Bernardo, J. (1979). Expected information as expected utility. Ann. Statist. 7 686-690. 
MR0527503 



28 



F. RIGAT AND J. Q. SMITH 



Bernardo, J. M. and Smith, A. F. M. (2007). Bayesian Theory. Wiley, Chichester, UK. 
MR1274699 

Box, G. E. P. (1980). Sampling and Bayes' inference in scientific modelling and robustness. 

J. Amer. Statist. Assoc. 143 383-430. MR0603745 
Brillinger, D. R. (1988). Some statistical methods for random processes data from 

seismology and neurophysiology. Ann. Statist. 16 1-54. MR0924855 
Brown, E. N. and Barbieri, R. (2006). Dynamic analyses of neural representations using 

the state-space modeling paradigm. In: The Cell Biology of Addiction (B. Madras, M. 

Von Zastrow, C. Colvis, J. Rutter, D. Shurtleff and J. Pollock, eds.). Cold Spring Harbor 

Laboratory Press, New York. 
Brown, E., Kass, R. E. and Mitra, P. P. (2004). Multiple neural spike train data analysis: 

State-of-the-art and future challenges. Nature Neuroscience 7 456-461. 
Buzsaki, G. (2004). Large scale recording of neuronal ensembles. Nature Neuroscience 7 

446-451. 

Cappe, O., Moulines, E. and Ryden, T. (2005). Inference in Hidden Markov Models. 

Springer, New York. MR2159833 
Carlin, B. P., Gelfand, A. E. and Smith, A. F. M. (1992). Hierarchical Bayesian analysis 

of changepoint problems. App. Statist. 41 389-405. 
Carota, C, Parmigiani, G. and Polson, N. (1996). Diagnostic measures for model 

criticism. J. Amer. Statist. Assoc. 91 753-762. MR1395742 
Chib, S. (1995). Marginal likelihood from the Gibbs output. J. Amer. Statist. Assoc. 90 

1313-1321. MR1379473 
Chib, S. (1998). Estimation and comparison of multiple change-point models. J. Econo- 
metrics 86 221-241. MR1649222 
Critchley, F., Marriott, P. and Salmon, M. (1994). Preferred point geometry and the 

local differential geometry of the Kullback-Leibler divergence. Ann. Statist. 22 1587- 

1602. MR1311991 

Dawid, A. P. (1984). Present position and potential developments: Some personal views. 
Statistical theory: The prequential approach. J. Roy. Statist. Soc. Ser. A 147 278-292. 
MR0763811 

Delorme, A., Makeig, S., Fabre-Thorpe, M. and Sejnowski, T. J. (2002). From single 

trial EEG to brain area dynamics. Neurocomputing 44 1057-1064. 
DlACONlS, P. and Ylvisaker, D. (1979). Conjugate priors for exponential families. Ann. 

Statist. 7 269-281. MR0520238 
Doucet, A., De Freitas, N. and Gordon, N. J. (2001). Seguential Monte Carlo Methods 

in Practice. Springer- Verlag, New York. MR1847783 
Eden, U. T. and Brown, E. N. (2008). Continuous-time filters for state estimation from 

point process models of neural data. Statist. Sinica. MR2468269 
Eden, U. T., Frank, L. M., Barbieri, R., Solo, V. and Brown, E. N. (2004). Dynamic 

analysis of neural encoding by point process adaptive filtering. Neural Comput. 16 971- 

998. 

Fearnhead, P. and Liu, Z. (2007). On-line inference for multiple change points. J. Roy. 
Statist. Soc. Ser. B 69 589-605. MR2370070 

Ferger, D. (1995). Nonparametric tests for nonstandard change-point problems. Ann. 
Statist. 23 1848-1861. MR1370310 

Fienberg, S. E. (1974). Stochastic models for single neuron firing trains: A survey. Bio- 
metrics 30 399-427. MR0359082 

Fruhwirth-Shnatter, S. (1995). Bayesian model discrimination and Bayes factors for 
linear Gaussian state-space models. J. Roy. Statist. Soc. Ser. B 1 237-246. MR1325388 



SEMI-PARAMETRIC NEURAL DYNAMICS 



29 



Fruhwirth-Shnatter, S. (2001). Markov chain Monte Carlo estimation ol classical 
and dynamic switching and mixture models. J. Amer. Statist. Assoc. 96 194-209. 
MR1952732 

Fruhwirth-Shnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer, 

New York. MR2265601 
Gamerman, D. (1992). A dynamic approach to the statistical analysis of point processes. 

Biometrika 79 39-50. MR1 158516 
Geisser, S. and Eddy, W. F. (1979). A predictive approach to model selection. J. Amer. 

Statist. Assoc. 74 153-160. MR0529531 
Gelfand, A. E. and Dey, E. K. (1994). Bayesian model choice: Asymptotics and exact 

calculations. J. Amer. Statist. Assoc. 56 501-514. MR1278223 
Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating 

marginal densities. J. Amer. Statist. Assoc. 85 398-409. MR1141740 
Gelman, A., Meng, X. L. and Stern, H. S. (1996). Posterior predictive assessment of 

model fitness via realized discrepancies. Statist. Sinica 6 733-807. MR1422404 
Ghahramani, Z. and Hinton, G. E. (2000). Variational learning for switching state-space 

models. Neural Comput. 12 831-864. 
Goutis, C. and Robert, C. (1998). Model choice in generalised linear models: A Bayesian 

approach via Kullback-Leibler projections. Biometrika 85 29-37. MR1627250 
Gutierrez-Pena, E. (1997). Moments for the canonical parameter of an exponential 

family under a conjugate distribution. Biometrika 84 727-732. MR1603960 
Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit 

problems. J. Roy. Statist. Soc. Ser. B 29 83-100. MR0216699 
Hall, P. (1987). On Kullback-Leibler loss and density estimation. Ann. Statist. 15 1491- 

1519. MR0913570 

Hamilton, J. D. (1990). Analysis of time series subject to changes in regime. J. Econo- 
metrics 45 39-70. MR1067230 

Hamilton, J. D. (1994). Time Series Analysis. Princeton Univ. Press, New Jersey. 
MR1278033 

Han, C. and Carlin, B. P. (2001). Markov chain Monte Carlo methods for computing 

Bayes factors: A comparative review. J. Amer. Statist. Assoc. 96 1122-1132. 
Hardle, W., Lutkepohl, H. and Chen, R. (1997). A review of nonparametric time series 

analysis. International Statistical Review 65 49-72. 
Harrison, P. J. and Stevens, C. F. (1976). Bayesian forecasting. J. Roy. Statist. Soc. 

Ser. B 38 205-247. MR0655429 
Hastie, T. (1987). A closer look at the deviance. Amer. Statist. 41 16-20. MR0882765 
Iyengar, S. (2001). The analysis of multiple neural spike trains. In Advances in Method- 
ological and Applied Aspects of Probability and Statistics (N. Bolakrishnan, ed.) 507- 

524. Taylor and Francis, New York. MR1977526 
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal 

of Basic Engineering 82 35-45. 
Kass, R. E., Ventura, V. and Brown, E. (2005). Statistical issues in the analysis of 

neuronal data. Journal of Neurophysiology 94 8-25. 
Kemp, K. W. (1957). Formulae for calculating the operating characteristic and the average 

sample number of some sequential tests. J. Roy. Statist. Soc. Ser. B 20 379-386. 
Kendrick, K. M., da Costa, A. P., Leigh, A. E., Hinton, M. R. and Peirce, J. W. 

(2001). Sheep don't forget a face. Nature 414 165-166. 
Kim, C. J. (1994). Dynamic linear models with Markov-switching. J. Econometrics 60 

1-22. MR1247815 



30 



F. RIGAT AND J. Q. SMITH 



Koyama, S., Perez-Bolde, L. C. and KASS, R. E. (2008). Approximate methods for 
state-space models: The Laplace-Gaussian filter. Submitted for publication. 

Kuhnert, P. M., Mergesen, K. and Tesar, P. (2003). Bridging the gap between different 
statistical approaches: An integrated framework for modelling. International Statistical 
Review 71 335-368. 

Kullback, S. (1997). Information Theory and Statistics. Dover, New York. MR1461541 
Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Ann. Math. 

Statist. 22 79-86. MR0039968 
Lindley, D. (1956). On a measure of the information provided by an experiment. Ann. 

Math. Statist. 27 986-1005. MR0083936 
Loader, C. R. (1996). Change point estimation using nonparametric regression. Ann. 

Statist. 24 1667-1678. MR1416655 
Makeig, S., Westerfield, M., Jung, T. P., Enghoff, S., Towsend, J., Courchesne, 

E. and Sejnowski, T. J. (2002). Dynamic brain sources of visual evoked responses. 

Science 295 690-694. 

Marjoram, P., Molitor, J., Plagnol, V. and Tavare, S. (2003). Markov chain Monte 

Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100 15324-15328. 
McCulloch, R. E. (1988). Information and the likelihood function in exponential families. 

Amer. Statist. 42 73-75. MR0936686 
McCulloch, R. E. and Tsay, R. S. (1994). Statistical analysis of economic time series 

via Markov switching models. J. Time Ser. Anal. 15 523-539. MR1263893 
Meng, X. L. (1994). Posterior predictive p- values. Ann. Statist. 22 1142-1160. MR1311969 
MlRA, A. and Petrone, S. (1996). Bayesian hierarchical nonparametric inference for 

change point problems. Bayesian Statistics 5 693-703. MR1425440 
Muller, H. G. (1992). Change-points in nonparametric regression analysis. Ann. Statist. 

20 737-761. MR1165590 
Newton, M. A. and Raftery, A. E. (1994). Approximate Bayesian inference by the 

weighted likelihood bootstrap. J. Roy. Statist. Soc. Ser. B 56 3-56. MR1257793 
O'Hagan, T. and Forster, J. (1999). Kendall's Advanced Theory of Statistics 2B. 

Arnold, London, UK. 

Okatan, M., Wilson, M. A. and Brown, E. N. (2005). Analysing functional connectivity 
using a network likelihood model of ensemble neural spiking activity. Neural Comput. 
17 1927-1961. 

Page, E. S. (1954). An improvement to Wald's approximation for some properties of 

sequential tests. J. Roy. Statist. Soc. Ser. B 16 136-139. MR0065879 
Page, E. S. (1955). A test for a change in a parameter occurring at an unknown point. 

Biometrika 42 523-527. MR0072412 
Page, E. S. (1961). Cumulative sum charts. Technometrics 3 1-9. MR0119344 
Plesser, H. E. and Gerstner, W. (2000). Niose in integrate-and-fire neurons: From 

stochastic input to escape rates. Neural Comput. 12 367-384. 
Rao, R. P. N. (2005). Hierarchical Bayesian inference in networks of spiking neurons. In 

Advances in NIPS 17. MIT Press, MA. 
Rigat, F., de Gunst, M. and ven Pelt, J. (2006). Bayesian modelling and analysis of 

spatio-temporal neuronal networks. Bayesian Anal. 1 733-764. MR2282205 
Robert, C. P., Celeux, G. and Diebolt, J. (1993). Bayesian estimation of hidden Markov 

chains: A stochastic implementation. Statist. Probab. Lett. 16 77-83. MR1208503 
Robinson, P. M. (1983). Non-parametric estimation for time series models. J. Time Ser. 

Anal. 4 185-208. 

Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the 
applied statistician. Ann. Statist. 12 1151-1172. MR0760681 



SEMI-PARAMETRIC NEURAL DYNAMICS 



31 



San Martini, A. and Spezzaferri, F. (1984). A predictive model selection criterion. 

J. Roy. Statist. Soc. Ser. B 46 296-383. MR0781890 
Shumway, R. H. and Stoffer, D. S. (1991). Dynamic linear models with switching. 

J. Amer. Statist. Assoc. 86 763-769. MR1147103 
Smith, A. C, Loren, M. F., Wyrth, S., Yanike, M., Hu, D., Kubota, Y., Graybiel, 

A. M., Suzuki, W. A. and Brown, E. M. (2004). Dynamic analysis of learning in 

behavioural experiments. Journal of Neuroscience 24 447-461. 
Smith, A. F. M. (1975). A Bayesian approach to inference about a change-point in a 

sequence of random variables. Biometrika 62 407-416. MR0381115 
Smith, A. F. M. and Roberts, G. O. (1993). Bayesian computations via the Gibbs sampler 

and related Markov chain Monte Carlo methods. J. Roy. Statist. Soc. Ser. B 55 3-23. 

MR1210421 

Smith, J. Q. (1990). Non-linear state space models with partially specified distributions 

on states. J. Forecast. 9 137-149. 
Smith, J. Q. (1992). A comparison of the characteristics of some Bayesian forecasting 

models. International Statistical Reviews 60 75-85. 
Spiegelhalter, D. J., Best, N. G., Carlin, P. B. and van der Linde, A. (2002). 

Bayesian measures of model complexity and fit. J. Roy. Statist. Soc. Ser. B 64 583- 

639. MR1979380 

Srinivansan, L., Eden, U. T., Willsky, A. S. and Brown, E. N. (2006). A state-space 
analysis for reconstruction of goal-directed movements using neural signals. Neural 
Comput. 18 2465-2494. MR2256113 

Stephens, D. A. (1994). Bayesian retrospective multiple-changepoint identification. App. 
Statist. 43 159-178. 

Stone, M. (1959). Application of a measure of information to the design and comparison 
of regression experiments. Ann. Math. Statist. 30 55-70. MR0 106528 

Tierney, L. (1994). Markov chains for exploring posterior distributions. Ann. Statist. 22 
1701-1762. MR1329166 

Tierney, L. and Kadane, J. B. (1986). Accurate approximations for posterior moments 
and marginal densities. J. Amer. Statist. Assoc. 81 84-86. MR0830567 

Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P. and Brown, E. N. 
(2005). A point process framework for relating neural spiking activity to spiking history, 
neural ensemble and extrinsic covariate effects. Journal of Neurophysiology 93 1074- 
1089. 

West, M. (1986). Bayesian model monitoring. J. Roy. Statist. Soc. Ser. B 48 70-78. 
MR0848052 

West, M. and Harrison, P. J. (1986). Monitoring and adaptation in Bayesian forecasting 

models. J. Amer. Statist. Assoc. 81 741-750. 
West, M. and Harrison, P. J. (1997). Bayesian Forecasting and Dynamic Models, 2nd 

ed. Springer, New York. MR1482232 
West, M., Harrison, P. J. and Migon, H. S. (1985). Dynamic generalised linear models 

and Bayesian forecasting. J. Amer. Statist. Assoc. 80 73-83. MR0786598 
Wirth, S., Yanike, M., Loren, M. F., Smith, A. C, Brown, E. M. and Suzuki, W. A. 

(2003). Single neurons in the monkey hippocampus and learning of new associations. 

Science 300 1578-1584. 



F. RIGAT AND J. Q. SMITH 



Department of Statistics and 

Centre for Analytical Science 
University of Warwick 
Coventry CV4 7AL 
UK 

E-MAIL: f.rigat@warwick.ac.uk 

J . Q . Smith® Warwick, ac . uk 



